Skip to content

Conversation

@GarrettBeatty
Copy link
Contributor

@GarrettBeatty GarrettBeatty commented Nov 17, 2025

Description

Add DownloadWithResponse api which returns the response metadata to the user as well as performing multi part download behind the scenes. This api will download to a file.

Progress tracking is not included in this PR. that will be included in the next pr #4139

Motivation and Context

#3806

Testing

  1. unit testing
  2. integration test
  3. dry run - in progress
  4. manual testing

DownloadWithResponseAsync api (includes multipart)


Total bytes per run: 5,368,709,120

Run:1 Secs:2.905885 Gb/s:14.780240
Run:2 Secs:2.144035 Gb/s:20.032173
Run:3 Secs:2.044687 Gb/s:21.005499
Run:4 Secs:2.065461 Gb/s:20.794235
Run:5 Secs:2.072479 Gb/s:20.723814
Run:6 Secs:2.057247 Gb/s:20.877259
Run:7 Secs:2.058968 Gb/s:20.859804
Run:8 Secs:2.075279 Gb/s:20.695853
Run:9 Secs:2.059316 Gb/s:20.856282
Run:10 Secs:2.073549 Gb/s:20.713126

vs old DownloadAsync api (non multipart)


Total bytes per run: 5,368,709,120

Run:1 Secs:69.819019 Gb/s:0.615157
Run:2 Secs:72.889799 Gb/s:0.589241
Run:3 Secs:63.697331 Gb/s:0.674277
Run:4 Secs:69.574499 Gb/s:0.617319
Run:5 Secs:62.779566 Gb/s:0.684135
Run:6 Secs:64.704739 Gb/s:0.663779
Run:7 Secs:65.786796 Gb/s:0.652862
Run:8 Secs:62.962942 Gb/s:0.682142
Run:9 Secs:65.438255 Gb/s:0.656339

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/multifile branch 2 times, most recently from 9e63b12 to fa59362 Compare November 17, 2025 20:58
Copilot finished reviewing on behalf of GarrettBeatty November 17, 2025 21:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new DownloadWithResponseAsync API for the AWS S3 SDK that enables multipart downloads with concurrent part downloads for improved performance. The implementation follows the S3 Express Protocol (SEP) specifications for atomic file operations and composite checksum handling.

Key Changes:

  • Adds new DownloadWithResponseAsync methods to TransferUtility that return response metadata
  • Implements multipart download infrastructure with concurrent part downloads
  • Adds atomic file handling using temporary files with .s3tmp.{uniqueId} pattern
  • Includes comprehensive unit and integration tests

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
TransferUtility.async.cs Added DownloadWithResponseAsync methods for async multipart downloads
TransferUtility.sync.cs Added synchronous DownloadWithResponse wrapper methods
MultipartDownloadCommand.cs Core orchestration logic for multipart download execution
MultipartDownloadCommand.async.cs Async implementation of multipart download with response mapping
FilePartDataHandler.cs Handles concurrent file writes at specific offsets for downloaded parts
FileDownloadConfiguration.cs Configuration class for file-based multipart downloads
AtomicFileHandler.cs Manages atomic file operations with temporary files
MultipartDownloadCoordinator.cs Added PrepareAsync call and null checks for discovery methods
IPartDataHandler.cs Added PrepareAsync interface method
BufferedPartDataHandler.cs Added no-op PrepareAsync implementation
MultipartDownloadTestHelpers.cs Added helper methods for test file management and verification
MultipartDownloadCommandTests.cs Unit tests for MultipartDownloadCommand
FilePartDataHandlerTests.cs Unit tests for FilePartDataHandler
FileDownloadConfigurationTests.cs Unit tests for FileDownloadConfiguration
AtomicFileHandlerTests.cs Unit tests for AtomicFileHandler
TransferUtilityDownloadWithResponseTests.cs Integration tests for end-to-end download functionality
9d07dc1e-d82d-4f94-8700-c7b57f872043.json Change log configuration for the new API

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/multifile branch 3 times, most recently from 437adfe to 6bb7a18 Compare November 18, 2025 18:28
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/mpdstream branch 3 times, most recently from fad732f to 0b3ed24 Compare November 19, 2025 17:56
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/multifile branch 2 times, most recently from fd1aba4 to 7269f56 Compare November 19, 2025 19:34
Copilot finished reviewing on behalf of GarrettBeatty November 19, 2025 20:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/multifile branch 4 times, most recently from 85641af to 7b371d0 Compare November 19, 2025 22:59
@GarrettBeatty GarrettBeatty marked this pull request as ready for review November 20, 2025 15:53
@GarrettBeatty GarrettBeatty requested a review from normj November 20, 2025 15:53
@GarrettBeatty GarrettBeatty marked this pull request as draft November 21, 2025 06:00
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/multifile branch 2 times, most recently from faca805 to 42d259c Compare November 21, 2025 17:40
Logger.DebugFormat("FilePartDataHandler: Opening file for writing at offset {0} with BufferSize={1}",
offset, _config.BufferSize);

// Open file with FileShare.Write to allow concurrent writes from other threads
Copy link
Contributor Author

@GarrettBeatty GarrettBeatty Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ive created sdk/test/Services/S3/UnitTests/Custom/FilePartDataHandlerConcurrencyTests.cs which validates this is safe (i.e each part creating its own filestream object and writing to a different offset at the same time)

// - Progress tracking with events
// - Size validation (ContentLength vs bytes read)
// - Buffered reading with proper chunk sizes
await response.WriteResponseStreamAsync(
Copy link
Contributor Author

@GarrettBeatty GarrettBeatty Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to use the same helper function i made reading response stream. I retested again and i see same performance results

Total bytes per run: 5,368,709,120

Run:1 Secs:2.494888 Gb/s:17.215071
Run:2 Secs:1.263313 Gb/s:33.997642
Run:3 Secs:1.120677 Gb/s:38.324749
Run:4 Secs:1.062679 Gb/s:40.416394
Run:5 Secs:1.032120 Gb/s:41.613045
Run:6 Secs:1.117588 Gb/s:38.430699
Run:7 Secs:1.087574 Gb/s:39.491276
Run:8 Secs:1.088066 Gb/s:39.473393
Run:9 Secs:1.120819 Gb/s:38.319901
Run:10 Secs:1.113135 Gb/s:38.584438

// Set ContentRange to represent the entire object: bytes 0-(ContentLength-1)/ContentLength
response.ContentRange = $"bytes 0-{discoveryResult.ObjectSize - 1}/{discoveryResult.ObjectSize}";
// S3 returns null for 0-byte objects, so we match that behavior
if (discoveryResult.ObjectSize == 0)
Copy link
Contributor Author

@GarrettBeatty GarrettBeatty Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a integration test for multi part download to file discovered this issue, so adding here to be consistent

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

/// <param name="bufferSize">Buffer size for reading/writing operations</param>
/// <param name="cancellationToken">Cancellation token</param>
/// <param name="validateSize">Whether to validate copied bytes match ContentLength</param>
internal async System.Threading.Tasks.Task WriteResponseStreamAsync(
Copy link
Contributor Author

@GarrettBeatty GarrettBeatty Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ive just added this as a helper function for writing to a response stream instead of a file so i can also use the same logic in BufferedPartDataHandler

@GarrettBeatty GarrettBeatty marked this pull request as ready for review November 21, 2025 18:14
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/multifile branch 3 times, most recently from 52ae28f to 407aeb6 Compare November 25, 2025 00:49
}

// Try up to 100 times to create unique file atomically
for (int attempt = 0; attempt < 100; attempt++)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if the for loop is really needed since in reality random should produce a unique id. we can remove this if we think its overkill

// SEP Part GET Step 2: "send the request and wait for the response in a non-blocking fashion"
var firstPartResponse = await _s3Client.GetObjectAsync(firstPartRequest, cancellationToken).ConfigureAwait(false);

if (firstPartResponse == null)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was some null pointer exception discovered here in a unit test so added this check

Copy link
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor nit comments that I don't need to rereview for.

#if NETSTANDARD
Stream stream = this.ResponseStream;
#else
Stream stream = new BufferedStream(this.ResponseStream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does .NET Framework need to wrap the ResponseStream in a a BufferedStream but not .NET Core? If there is a good reason add a comment why it is being done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont have the answer for this but i kept the logic because it was there before, which is why i didnt add a comment because i dont know 100%

/// Generates a cryptographically secure unique identifier of specified length.
/// Uses base32 encoding to avoid filesystem-problematic characters.
/// </summary>
private string GenerateUniqueId(int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Name GenerateRandomId since you can't guarantee the id being unique and probably don't say it is unique in the comment block above. You have the other block of code calling this to generate a tmp file and retrying up to 100 times till you find a unique value.

@GarrettBeatty GarrettBeatty changed the base branch from gcbeatty/mpdstream to feature/transfermanager November 28, 2025 15:02
@GarrettBeatty GarrettBeatty merged commit 2e268c0 into feature/transfermanager Nov 28, 2025
1 check passed
@GarrettBeatty GarrettBeatty deleted the gcbeatty/multifile branch November 28, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants